In this brief workflow, we start with aligned featureCounts mapped at the gene-level and perform data pre-processing and exploratory analyses. Base R and the edgeR package are used to import, filter, and organize the raw counts data. We then finish off with data reduction visualizations and quality control checks. By doing so, this allows for ease in performing downstream differential expression analysis and gene set testing.
This pipeline is inspired by Law et al. 2018 and Justin Colacino’s ‘Komen plexwell processing.R’ script (unpublished).
BiocStyle 2.25.1
The raw counts and sample metadata data files are available from:
https://drive.google.com/drive/folders/1G2xSZfloEjk_cMuFKbglx8UmO7hVylcE
Tip: Set your working directory by entering Ctrl + Shift + H You should download the files listed below and place them into a folder in your working directory.
Komen_Jun22_seqwell_counts.txt
Seq_well_re_runs_June2022.xlsx
Packages used:
To install the packages, you can:
R version 4.2.0.if (!requireNamespace("BiocManager"))
install.packages("BiocManager")
BiocManager::install(c("edgeR", "AnnotationDbi",
"org.Hs.eg.db", "org.Mm.eg.db", "Biostrings", "tidyverse", "gplots", "Glimma"))First, let’s load all the packages we will use to process and analyze the data.
The data for this pipeline comes from the June 22 Plexwell RNA-seqwell experiment. It includes the processed data (counts) for the 5-3-22 Komen experiment re-runs and extra experimental sample wells for Rachel Morgan, Katelyn Polemi, Linda Samuelson.
For reference, we re-processed the following samples for the Komen Plexwell Experiment on 5-3-2022:
| Pool Name | Sample | sample.id | Treatment | Description | Dose | lib.size |
|---|---|---|---|---|---|---|
| KCR 7518 | KCR7518 | CCATACTC.AGTTTCCT | BPA | Bisphenol A_1 | 0.1 | 4438319 |
| KCR 7518 | KCR7518 | TCCTTGGC.AGTTTCCT | BPA | Bisphenol A_2 | 0.1 | 4592853 |
| KCR 7518 | KCR7518 | TCACTCAC.AGTTTCCT | BPA | Bisphenol A_3 | 0.1 | 4830331 |
| KCR 7518 | KCR7518 | CAGGCTTC.AGTTTCCT | BPA | Bisphenol A_4 | 1 | 4988069 |
| KCR 7518 | KCR7518 | CCTACACA.AGTTTCCT | BPA | Bisphenol A_5 | 1 | 3906282 |
| KCR 7518 | KCR7518 | ATGGAACA.AGTTTCCT | BPS | Bisphenol S_8 | 10 | 3986349 |
| KCR 7889 | KCR7889 | CCATACTC.CCTCCATA | BPA | Bisphenol A_1 | 0.1 | 330286 |
| KCR 7889 | KCR7889 | TCCTTGGC.CCTCCATA | BPA | Bisphenol A_2 | 0.1 | 700937 |
| KCR 7889 | KCR7889 | TCACTCAC.CCTCCATA | BPA | Bisphenol A_3 | 0.1 | 664177 |
| KCR 7889 | KCR7889 | CAGGCTTC.CCTCCATA | BPA | Bisphenol A_4 | 1 | 602852 |
| KCR 7889 | KCR7889 | CCTACACA.CCTCCATA | BPA | Bisphenol A_5 | 1 | 590262 |
| KCR 7889 | KCR7889 | CGCGTGAT.CCTCCATA | BPA | Bisphenol A_6 | 1 | 223327 |
| KCR 7889 | KCR7889 | CATCTTCT.CCTCCATA | BPA | Bisphenol A_7 | 10 | 113064 |
| KCR 7889 | KCR7889 | ACATCCTT.CCTCCATA | BPA | Bisphenol A_8 | 10 | 145697 |
| KCR 7889 | KCR7889 | ACACAACA.CCTCCATA | BPA | Bisphenol A_9 | 10 | 463697 |
| KCR 7889 | KCR7889 | TTGGCTGC.CCTCCATA | BPS | Bisphenol S_1 | 0.1 | 4643601 |
| KCR 7889 | KCR7889 | ATGACACC.CCTCCATA | BPS | Bisphenol S_3 | 0.1 | 3850642 |
| KCR 7889 | KCR7889 | TGTTGCAC.CCTCCATA | BPS | Bisphenol S_4 | 1 | 1067957 |
| KCR 7889 | KCR7889 | ATTCTCCA.CCTCCATA | BPS | Bisphenol S_5 | 1 | 2892147 |
| KCR 7889 | KCR7889 | CGCAACAG.CCTCCATA | BPS | Bisphenol S_6 | 1 | 1833794 |
| KCR 7889 | KCR7889 | CTTCTGGC.CCTCCATA | BPS | Bisphenol S_7 | 10 | 522111 |
| KCR 7889 | KCR7889 | ATGGAACA.CCTCCATA | BPS | Bisphenol S_8 | 10 | 571686 |
| KCR 7889 | KCR7889 | CTAACAAC.CCTCCATA | BPS | Bisphenol S_9 | 10 | 1475635 |
| KCR 7889 | KCR7889 | CAGGCCAT.CCTCCATA | PFNA | PFNA_1 | 0.1 | 337540 |
| KCR 7889 | KCR7889 | CAACTCCG.CCTCCATA | PFNA | PFNA_2 | 0.1 | 518427 |
| KCR 7889 | KCR7889 | ACCGACCA.CCTCCATA | PFNA | PFNA_3 | 0.1 | 81384 |
| KCR 7889 | KCR7889 | GTGCGAGT.CCTCCATA | PFNA | PFNA_4 | 1 | 489221 |
| KCR 7889 | KCR7889 | ATGCCGCT.CCTCCATA | PFNA | PFNA_5 | 1 | 519967 |
| KCR 7889 | KCR7889 | TCCTCAGA.CCTCCATA | PFNA | PFNA_6 | 1 | 126051 |
| KCR 7889 | KCR7889 | ACGCTGCA.CCTCCATA | PFNA | PFNA_7 | 10 | 101283 |
| KCR 7889 | KCR7889 | CGATGGCA.CCTCCATA | PFNA | PFNA_8 | 10 | 98809 |
| KCR 7889 | KCR7889 | CAACCGTG.CCTCCATA | PFNA | PFNA_9 | 10 | 281834 |
| KCR 7953 | KCR7953 | CGCTCTTG.TATTTGAG | DMSO | Control 71 | 0 | 4005522 |
| KCR 7953 | KCR7953 | TGAACTCT.TATTTGAG | DMSO | Control 72 | 0 | 556342 |
| KCR 7953 | KCR7953 | ACTCACCG.TATTTGAG | Water | Control 61 | 0 | 2935786 |
| KCR 7953 | KCR7953 | CCTTATGT.TATTTGAG | Water | Control 62 | 0 | 889909 |
| KCR 7953 | KCR7953 | CCATACTC.TATTTGAG | BPA | Bisphenol A_1 | 0.1 | 4322 |
| KCR 7953 | KCR7953 | TCCTTGGC.TATTTGAG | BPA | Bisphenol A_2 | 0.1 | 2670 |
| KCR 7953 | KCR7953 | TCACTCAC.TATTTGAG | BPA | Bisphenol A_3 | 0.1 | 34642 |
| KCR 7953 | KCR7953 | CAGGCTTC.TATTTGAG | BPA | Bisphenol A_4 | 1 | 2731 |
| KCR 7953 | KCR7953 | CCTACACA.TATTTGAG | BPA | Bisphenol A_5 | 1 | 37554 |
| KCR 7953 | KCR7953 | CGCGTGAT.TATTTGAG | BPA | Bisphenol A_6 | 1 | 30332 |
| KCR 7953 | KCR7953 | CATCTTCT.TATTTGAG | BPA | Bisphenol A_7 | 10 | 4479 |
| KCR 7953 | KCR7953 | ACATCCTT.TATTTGAG | BPA | Bisphenol A_8 | 10 | 1764 |
| KCR 7953 | KCR7953 | ACACAACA.TATTTGAG | BPA | Bisphenol A_9 | 10 | 98017 |
| KCR 7953 | KCR7953 | TTGGCTGC.TATTTGAG | BPS | Bisphenol S_1 | 0.1 | 328345 |
| KCR 7953 | KCR7953 | GATGAGAA.TATTTGAG | BPS | Bisphenol S_2 | 0.1 | 165583 |
| KCR 7953 | KCR7953 | ATGACACC.TATTTGAG | BPS | Bisphenol S_3 | 0.1 | 119687 |
| KCR 7953 | KCR7953 | TGTTGCAC.TATTTGAG | BPS | Bisphenol S_4 | 1 | 14059 |
| KCR 7953 | KCR7953 | ATTCTCCA.TATTTGAG | BPS | Bisphenol S_5 | 1 | 312456 |
| KCR 7953 | KCR7953 | CGCAACAG.TATTTGAG | BPS | Bisphenol S_6 | 1 | 35607 |
| KCR 7953 | KCR7953 | CTTCTGGC.TATTTGAG | BPS | Bisphenol S_7 | 10 | 15369 |
| KCR 7953 | KCR7953 | ATGGAACA.TATTTGAG | BPS | Bisphenol S_8 | 10 | 2159 |
| KCR 7953 | KCR7953 | CTAACAAC.TATTTGAG | BPS | Bisphenol S_9 | 10 | 57722 |
| KCR 7953 | KCR7953 | TGGTGGAA.TATTTGAG | Cadmium_Chloride | Cadmium_Chloride_1 | 0.1 | 4142100 |
| KCR 7953 | KCR7953 | CTGTACGC.TATTTGAG | Cadmium_Chloride | Cadmium_Chloride_2 | 0.1 | 2517570 |
| KCR 7953 | KCR7953 | ACTCGAAT.TATTTGAG | Cadmium_Chloride | Cadmium_Chloride_7 | 10 | 854141 |
| KCR 7953 | KCR7953 | ACGAAGCG.TATTTGAG | DDE | DDE_1 | 0.1 | 1569364 |
| KCR 7953 | KCR7953 | CTCTCAGG.TATTTGAG | DDE | DDE_2 | 0.1 | 310580 |
| KCR 7953 | KCR7953 | CACCGCAA.TATTTGAG | DDE | DDE_3 | 0.1 | 1159972 |
| KCR 7953 | KCR7953 | TGCTCCGT.TATTTGAG | DDE | DDE_4 | 1 | 401552 |
| KCR 7953 | KCR7953 | CGAGCATT.TATTTGAG | DDE | DDE_5 | 1 | 70195 |
| KCR 7953 | KCR7953 | ACCGTTCC.TATTTGAG | DDE | DDE_6 | 1 | 1311310 |
| KCR 7953 | KCR7953 | TCAAGGAT.TATTTGAG | DDE | DDE_7 | 10 | 615546 |
| KCR 7953 | KCR7953 | CAAGTGAC.TATTTGAG | DDE | DDE_8 | 10 | 444754 |
| KCR 7953 | KCR7953 | CAGAGTGG.TATTTGAG | DDE | DDE_9 | 10 | 72186 |
| KCR 7953 | KCR7953 | CAGGCCAT.TATTTGAG | PFNA | PFNA_1 | 0.1 | 6665 |
| KCR 7953 | KCR7953 | CAACTCCG.TATTTGAG | PFNA | PFNA_2 | 0.1 | 3072 |
| KCR 7953 | KCR7953 | ACCGACCA.TATTTGAG | PFNA | PFNA_3 | 0.1 | 61567 |
| KCR 7953 | KCR7953 | GTGCGAGT.TATTTGAG | PFNA | PFNA_4 | 1 | 32518 |
| KCR 7953 | KCR7953 | ATGCCGCT.TATTTGAG | PFNA | PFNA_5 | 1 | 52864 |
| KCR 7953 | KCR7953 | TCCTCAGA.TATTTGAG | PFNA | PFNA_6 | 1 | 13358 |
| KCR 7953 | KCR7953 | ACGCTGCA.TATTTGAG | PFNA | PFNA_7 | 10 | 25890 |
| KCR 7953 | KCR7953 | CGATGGCA.TATTTGAG | PFNA | PFNA_8 | 10 | 1355 |
| KCR 7953 | KCR7953 | CAACCGTG.TATTTGAG | PFNA | PFNA_9 | 10 | 2671 |
| KCR 8195 | KCR8195 | CCACAATG.TCATATAT | Sodium_Arsenite | Sodium_Arsenite_7 | 10 | 4404878 |
| KCR 8195 | KCR8195 | TACAGAGT.TCATATAT | Sodium_Arsenite | Sodium_Arsenite_9 | 10 | 2395550 |
| KCR 8519 | KCR8519 | TACAGAGT.ATGTATCA | Sodium_Arsenite | Sodium_Arsenite_9 | 10 | 4458760 |
| KCR 8580 | KCR8580 | CCATACTC.CAATGCAA | BPA | Bisphenol A_1 | 0.1 | 190179 |
| KCR 8580 | KCR8580 | TCCTTGGC.CAATGCAA | BPA | Bisphenol A_2 | 0.1 | 402214 |
| KCR 8580 | KCR8580 | TCACTCAC.CAATGCAA | BPA | Bisphenol A_3 | 0.1 | 366393 |
| KCR 8580 | KCR8580 | CAGGCTTC.CAATGCAA | BPA | Bisphenol A_4 | 1 | 286710 |
| KCR 8580 | KCR8580 | CCTACACA.CAATGCAA | BPA | Bisphenol A_5 | 1 | 204553 |
| KCR 8580 | KCR8580 | CGCGTGAT.CAATGCAA | BPA | Bisphenol A_6 | 1 | 238993 |
| KCR 8580 | KCR8580 | CATCTTCT.CAATGCAA | BPA | Bisphenol A_7 | 10 | 569666 |
| KCR 8580 | KCR8580 | ACATCCTT.CAATGCAA | BPA | Bisphenol A_8 | 10 | 122211 |
| KCR 8580 | KCR8580 | ACACAACA.CAATGCAA | BPA | Bisphenol A_9 | 10 | 110789 |
| KCR 8580 | KCR8580 | ATGACACC.CAATGCAA | BPS | Bisphenol S_3 | 0.1 | 3715079 |
| KCR 8580 | KCR8580 | TGTTGCAC.CAATGCAA | BPS | Bisphenol S_4 | 1 | 3866428 |
| KCR 8580 | KCR8580 | ATTCTCCA.CAATGCAA | BPS | Bisphenol S_5 | 1 | 2662372 |
| KCR 8580 | KCR8580 | CGCAACAG.CAATGCAA | BPS | Bisphenol S_6 | 1 | 2456087 |
| KCR 8580 | KCR8580 | CTTCTGGC.CAATGCAA | BPS | Bisphenol S_7 | 10 | 501850 |
| KCR 8580 | KCR8580 | ATGGAACA.CAATGCAA | BPS | Bisphenol S_8 | 10 | 433595 |
| KCR 8580 | KCR8580 | CTAACAAC.CAATGCAA | BPS | Bisphenol S_9 | 10 | 412276 |
| KCR 8580 | KCR8580 | CAGAGTGG.CAATGCAA | DDE | DDE_9 | 10 | 4263103 |
| KCR 8580 | KCR8580 | CAGGCCAT.CAATGCAA | PFNA | PFNA_1 | 0.1 | 330140 |
| KCR 8580 | KCR8580 | CAACTCCG.CAATGCAA | PFNA | PFNA_2 | 0.1 | 274683 |
| KCR 8580 | KCR8580 | ACCGACCA.CAATGCAA | PFNA | PFNA_3 | 0.1 | 160945 |
| KCR 8580 | KCR8580 | GTGCGAGT.CAATGCAA | PFNA | PFNA_4 | 1 | 232231 |
| KCR 8580 | KCR8580 | ATGCCGCT.CAATGCAA | PFNA | PFNA_5 | 1 | 228138 |
| KCR 8580 | KCR8580 | TCCTCAGA.CAATGCAA | PFNA | PFNA_6 | 1 | 63019 |
| KCR 8580 | KCR8580 | ACGCTGCA.CAATGCAA | PFNA | PFNA_7 | 10 | 57682 |
| KCR 8580 | KCR8580 | CGATGGCA.CAATGCAA | PFNA | PFNA_8 | 10 | 49925 |
| KCR 8580 | KCR8580 | CAACCGTG.CAATGCAA | PFNA | PFNA_9 | 10 | 133796 |
*NOTE: When publishing your rmarkdown file, these files will be deleted from the folder due to the knitr embedding and pandoc file conversion process. To prevent this, go to the folder where you saved these file(s), right click the file -> ‘properties’ -> select the ‘Read-only’ checkbox -> hit ‘apply’ -> exit by clicking ‘OK’. To get started, set up an RStudio project folder (i.e. ~/data directory/) specifying where you have saved the data files. Import and read in the counts and sample metadata from above.*
# Read the raw counts data into R
seqdata <- read.table("~/Komen_Jun22_seqwell_counts.txt", header = T)
# Read the sample metadata information into R
all.metadata <- readxl::read_xlsx("~/Seq_well_re_runs_June2022.xlsx")The seqdata file contains the gene-level raw counts for a given sample. Let’s take a quick glance at the data. You can use the dim command to see how many rows and columns the data frame has. The colnames command will tell you the names of the columns. Use the head or tail command to see a preview of the first or last 6 lines of the data frame, respectively.
dim(seqdata)## [1] 95566 198
head(seqdata)## # A tibble: 6 × 198
## Geneid Chr Start End Strand Length X.nfs…¹ X.nfs…² X.nfs…³ X.nfs…⁴ X.nfs…⁵
## <chr> <chr> <chr> <chr> <chr> <int> <int> <int> <int> <int> <int>
## 1 DDX11… HUMA… 1186… 1222… +;+;+… 1756 0 0 0 0 0
## 2 WASH7P HUMA… 1436… 1482… -;-;-… 2073 0 154 303 225 487
## 3 MIR13… HUMA… 2955… 3003… +;+;+… 1021 0 0 0 0 0
## 4 FAM13… HUMA… 3455… 3517… -;-;-… 1219 0 0 0 0 0
## 5 OR4G4P HUMA… 5247… 5331… +;+;+ 947 0 0 0 0 0
## 6 OR4G1… HUMA… 62948 63887 + 940 0 0 0 0 0
## # … with 187 more variables:
## # X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TCAAGGAT.TCATATAT_S538Aligned.sortedByCoord.out.bam <int>,
## # X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_GTGCGTTC.ATGTATCA_S483Aligned.sortedByCoord.out.bam <int>,
## # X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CCACAATG.TCATATAT_S514Aligned.sortedByCoord.out.bam <int>,
## # X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_GTGCGTTC.TCATATAT_S579Aligned.sortedByCoord.out.bam <int>,
## # X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CTGGTCGT.TCATATAT_S519Aligned.sortedByCoord.out.bam <int>,
## # X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ACCGTTCC.ATGTATCA_S441Aligned.sortedByCoord.out.bam <int>, …
## # ℹ Use `colnames()` to see all variable names
colnames(seqdata)## [1] "Geneid"
## [2] "Chr"
## [3] "Start"
## [4] "End"
## [5] "Strand"
## [6] "Length"
## [7] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_GTTATCGA.TCATATAT_S574Aligned.sortedByCoord.out.bam"
## [8] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TTCCATTC.ATGTATCA_S476Aligned.sortedByCoord.out.bam"
## [9] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CTCTCAGG.ATGTATCA_S437Aligned.sortedByCoord.out.bam"
## [10] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TACAGAGT.ATGTATCA_S420Aligned.sortedByCoord.out.bam"
## [11] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TGCTCCGT.ATGTATCA_S439Aligned.sortedByCoord.out.bam"
## [12] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TCAAGGAT.TCATATAT_S538Aligned.sortedByCoord.out.bam"
## [13] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_GTGCGTTC.ATGTATCA_S483Aligned.sortedByCoord.out.bam"
## [14] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CCACAATG.TCATATAT_S514Aligned.sortedByCoord.out.bam"
## [15] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_GTGCGTTC.TCATATAT_S579Aligned.sortedByCoord.out.bam"
## [16] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CTGGTCGT.TCATATAT_S519Aligned.sortedByCoord.out.bam"
## [17] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ACCGTTCC.ATGTATCA_S441Aligned.sortedByCoord.out.bam"
## [18] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ACACAACA.ATGTATCA_S462Aligned.sortedByCoord.out.bam"
## [19] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CGCGTGAT.ATGTATCA_S459Aligned.sortedByCoord.out.bam"
## [20] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ATGACACC.TCATATAT_S543Aligned.sortedByCoord.out.bam"
## [21] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TCCTATCT.TCATATAT_S511Aligned.sortedByCoord.out.bam"
## [22] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_AACCAATC.TCATATAT_S505Aligned.sortedByCoord.out.bam"
## [23] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_GCCAGTGT.ATGTATCA_S482Aligned.sortedByCoord.out.bam"
## [24] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TCTATTGG.TCATATAT_S506Aligned.sortedByCoord.out.bam"
## [25] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CATCTTCT.TCATATAT_S556Aligned.sortedByCoord.out.bam"
## [26] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CCGAGTTA.ATGTATCA_S393Aligned.sortedByCoord.out.bam"
## [27] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TTCACACT.ATGTATCA_S413Aligned.sortedByCoord.out.bam"
## [28] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CTAGCCGA.TCATATAT_S571Aligned.sortedByCoord.out.bam"
## [29] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CGCTCTTG.ATGTATCA_S434Aligned.sortedByCoord.out.bam"
## [30] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ACTCACCG.ATGTATCA_S388Aligned.sortedByCoord.out.bam"
## [31] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ATGCCGCT.TCATATAT_S563Aligned.sortedByCoord.out.bam"
## [32] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TGGTACAG.TCATATAT_S527Aligned.sortedByCoord.out.bam"
## [33] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CCTGTTAC.ATGTATCA_S403Aligned.sortedByCoord.out.bam"
## [34] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CGCAACAG.ATGTATCA_S450Aligned.sortedByCoord.out.bam"
## [35] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CTAGCCGA.ATGTATCA_S475Aligned.sortedByCoord.out.bam"
## [36] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ACTCGAAT.ATGTATCA_S400Aligned.sortedByCoord.out.bam"
## [37] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CGCGTGAT.TCATATAT_S555Aligned.sortedByCoord.out.bam"
## [38] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_AAGTACCT.TCATATAT_S502Aligned.sortedByCoord.out.bam"
## [39] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_GATGAGAA.ATGTATCA_S446Aligned.sortedByCoord.out.bam"
## [40] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CCTACACA.ATGTATCA_S458Aligned.sortedByCoord.out.bam"
## [41] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TCCTTGGC.TCATATAT_S551Aligned.sortedByCoord.out.bam"
## [42] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ACCTGAGC.TCATATAT_S523Aligned.sortedByCoord.out.bam"
## [43] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_GATGAGAA.TCATATAT_S542Aligned.sortedByCoord.out.bam"
## [44] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_AACGCTTG.TCATATAT_S528Aligned.sortedByCoord.out.bam"
## [45] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TGGACAAC.TCATATAT_S575Aligned.sortedByCoord.out.bam"
## [46] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CATCTTCT.ATGTATCA_S460Aligned.sortedByCoord.out.bam"
## [47] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ATTCGCAG.TCATATAT_S522Aligned.sortedByCoord.out.bam"
## [48] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_GGTGTGAC.TCATATAT_S513Aligned.sortedByCoord.out.bam"
## [49] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CGTAATTC.TCATATAT_S521Aligned.sortedByCoord.out.bam"
## [50] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ACATCCTT.ATGTATCA_S461Aligned.sortedByCoord.out.bam"
## [51] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CGATGGCA.ATGTATCA_S470Aligned.sortedByCoord.out.bam"
## [52] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CCAGGTAA.ATGTATCA_S429Aligned.sortedByCoord.out.bam"
## [53] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ACTCGAAT.TCATATAT_S496Aligned.sortedByCoord.out.bam"
## [54] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TGGACAAC.ATGTATCA_S479Aligned.sortedByCoord.out.bam"
## [55] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_GTTAAGCA.ATGTATCA_S408Aligned.sortedByCoord.out.bam"
## [56] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CTGTACGC.TCATATAT_S491Aligned.sortedByCoord.out.bam"
## [57] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CAGGCTTC.TCATATAT_S553Aligned.sortedByCoord.out.bam"
## [58] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CTAACAAC.ATGTATCA_S453Aligned.sortedByCoord.out.bam"
## [59] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TCAACTGA.ATGTATCA_S472Aligned.sortedByCoord.out.bam"
## [60] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_GTTCGTCT.TCATATAT_S524Aligned.sortedByCoord.out.bam"
## [61] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CCTTATGT.TCATATAT_S485Aligned.sortedByCoord.out.bam"
## [62] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CACCGCAA.TCATATAT_S534Aligned.sortedByCoord.out.bam"
## [63] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CAACCGTG.TCATATAT_S567Aligned.sortedByCoord.out.bam"
## [64] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_GTACCAGC.TCATATAT_S497Aligned.sortedByCoord.out.bam"
## [65] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ATTCCGTA.TCATATAT_S529Aligned.sortedByCoord.out.bam"
## [66] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CCGAGTTA.TCATATAT_S489Aligned.sortedByCoord.out.bam"
## [67] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CTATTCCA.TCATATAT_S576Aligned.sortedByCoord.out.bam"
## [68] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CTCGTACA.ATGTATCA_S424Aligned.sortedByCoord.out.bam"
## [69] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TCCTCAGA.ATGTATCA_S468Aligned.sortedByCoord.out.bam"
## [70] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CGAGCATT.TCATATAT_S536Aligned.sortedByCoord.out.bam"
## [71] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CAGGAAGG.TCATATAT_S494Aligned.sortedByCoord.out.bam"
## [72] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TGAATGTG.TCATATAT_S507Aligned.sortedByCoord.out.bam"
## [73] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_GGCTCCTA.ATGTATCA_S396Aligned.sortedByCoord.out.bam"
## [74] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ATGGAACA.TCATATAT_S548Aligned.sortedByCoord.out.bam"
## [75] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TCAGATAC.TCATATAT_S515Aligned.sortedByCoord.out.bam"
## [76] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CCATACTC.ATGTATCA_S454Aligned.sortedByCoord.out.bam"
## [77] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_GTTATCGA.ATGTATCA_S478Aligned.sortedByCoord.out.bam"
## [78] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TGAATGTG.ATGTATCA_S411Aligned.sortedByCoord.out.bam"
## [79] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_AATGTGCC.ATGTATCA_S405Aligned.sortedByCoord.out.bam"
## [80] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TGTAAGAC.TCATATAT_S518Aligned.sortedByCoord.out.bam"
## [81] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_GGAGCTAT.TCATATAT_S487Aligned.sortedByCoord.out.bam"
## [82] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TGGTGACT.ATGTATCA_S392Aligned.sortedByCoord.out.bam"
## [83] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CGCTCTTG.TCATATAT_S530Aligned.sortedByCoord.out.bam"
## [84] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CAAGTAGT.TCATATAT_S569Aligned.sortedByCoord.out.bam"
## [85] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TGAACTCT.ATGTATCA_S435Aligned.sortedByCoord.out.bam"
## [86] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_AACGCTTG.ATGTATCA_S432Aligned.sortedByCoord.out.bam"
## [87] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CACCGCAA.ATGTATCA_S438Aligned.sortedByCoord.out.bam"
## [88] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TGCTCCGT.TCATATAT_S535Aligned.sortedByCoord.out.bam"
## [89] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ATTCTCCA.TCATATAT_S545Aligned.sortedByCoord.out.bam"
## [90] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TTGGCTGC.ATGTATCA_S445Aligned.sortedByCoord.out.bam"
## [91] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ATGACACC.ATGTATCA_S447Aligned.sortedByCoord.out.bam"
## [92] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TCACTCAC.ATGTATCA_S456Aligned.sortedByCoord.out.bam"
## [93] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ATGGAACA.ATGTATCA_S452Aligned.sortedByCoord.out.bam"
## [94] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_GTGCGAGT.ATGTATCA_S466Aligned.sortedByCoord.out.bam"
## [95] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TGTTGCAC.ATGTATCA_S448Aligned.sortedByCoord.out.bam"
## [96] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ACCGTTCC.TCATATAT_S537Aligned.sortedByCoord.out.bam"
## [97] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CAAGTGAC.ATGTATCA_S443Aligned.sortedByCoord.out.bam"
## [98] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_GTTGACAG.TCATATAT_S500Aligned.sortedByCoord.out.bam"
## [99] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CACATGGT.TCATATAT_S512Aligned.sortedByCoord.out.bam"
## [100] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ATGGTCCG.ATGTATCA_S414Aligned.sortedByCoord.out.bam"
## [101] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CTTGTTGG.ATGTATCA_S421Aligned.sortedByCoord.out.bam"
## [102] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ACATCCTT.TCATATAT_S557Aligned.sortedByCoord.out.bam"
## [103] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ACGAAGCG.TCATATAT_S532Aligned.sortedByCoord.out.bam"
## [104] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ACACAACA.TCATATAT_S558Aligned.sortedByCoord.out.bam"
## [105] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CCATTGCG.TCATATAT_S508Aligned.sortedByCoord.out.bam"
## [106] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ATAGATCC.ATGTATCA_S390Aligned.sortedByCoord.out.bam"
## [107] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ATTCCGTA.ATGTATCA_S433Aligned.sortedByCoord.out.bam"
## [108] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CAGGCCAT.ATGTATCA_S463Aligned.sortedByCoord.out.bam"
## [109] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CTGCGAAC.ATGTATCA_S481Aligned.sortedByCoord.out.bam"
## [110] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CAGGCTTC.ATGTATCA_S457Aligned.sortedByCoord.out.bam"
## [111] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ACCGACCA.TCATATAT_S561Aligned.sortedByCoord.out.bam"
## [112] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TCAAGGAT.ATGTATCA_S442Aligned.sortedByCoord.out.bam"
## [113] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TACAGAGT.TCATATAT_S516Aligned.sortedByCoord.out.bam"
## [114] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ATGGTCCG.TCATATAT_S510Aligned.sortedByCoord.out.bam"
## [115] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_GTGTCCAT.TCATATAT_S570Aligned.sortedByCoord.out.bam"
## [116] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TTGGCTGC.TCATATAT_S541Aligned.sortedByCoord.out.bam"
## [117] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CACAGTCT.ATGTATCA_S430Aligned.sortedByCoord.out.bam"
## [118] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CTGCGAAC.TCATATAT_S577Aligned.sortedByCoord.out.bam"
## [119] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ACGCTGCA.ATGTATCA_S469Aligned.sortedByCoord.out.bam"
## [120] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CTAACAAC.TCATATAT_S549Aligned.sortedByCoord.out.bam"
## [121] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CGTAATTC.ATGTATCA_S425Aligned.sortedByCoord.out.bam"
## [122] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TCTATTGG.ATGTATCA_S410Aligned.sortedByCoord.out.bam"
## [123] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ACTCACCG.TCATATAT_S484Aligned.sortedByCoord.out.bam"
## [124] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TCCTTGGC.ATGTATCA_S455Aligned.sortedByCoord.out.bam"
## [125] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ACGAAGCG.ATGTATCA_S436Aligned.sortedByCoord.out.bam"
## [126] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CAGAGTGG.ATGTATCA_S444Aligned.sortedByCoord.out.bam"
## [127] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CAACTCCG.TCATATAT_S560Aligned.sortedByCoord.out.bam"
## [128] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_GTGCGAGT.TCATATAT_S562Aligned.sortedByCoord.out.bam"
## [129] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ATGCCGCT.ATGTATCA_S467Aligned.sortedByCoord.out.bam"
## [130] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CAGGCCAT.TCATATAT_S559Aligned.sortedByCoord.out.bam"
## [131] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CTATTCCA.ATGTATCA_S480Aligned.sortedByCoord.out.bam"
## [132] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CAGAGTGG.TCATATAT_S540Aligned.sortedByCoord.out.bam"
## [133] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TCACTCAC.TCATATAT_S552Aligned.sortedByCoord.out.bam"
## [134] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CTGTACGC.ATGTATCA_S395Aligned.sortedByCoord.out.bam"
## [135] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_GTTCGTCT.ATGTATCA_S428Aligned.sortedByCoord.out.bam"
## [136] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TTCACACT.TCATATAT_S509Aligned.sortedByCoord.out.bam"
## [137] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CAGGAAGG.ATGTATCA_S398Aligned.sortedByCoord.out.bam"
## [138] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_GTTAAGCA.TCATATAT_S504Aligned.sortedByCoord.out.bam"
## [139] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_GGCTCCTA.TCATATAT_S492Aligned.sortedByCoord.out.bam"
## [140] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CCTACACA.TCATATAT_S554Aligned.sortedByCoord.out.bam"
## [141] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CGCAACAG.TCATATAT_S546Aligned.sortedByCoord.out.bam"
## [142] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CTCGTACA.TCATATAT_S520Aligned.sortedByCoord.out.bam"
## [143] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ACTTCAAC.TCATATAT_S498Aligned.sortedByCoord.out.bam"
## [144] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_AATGTGCC.TCATATAT_S501Aligned.sortedByCoord.out.bam"
## [145] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_GGAGCTAT.ATGTATCA_S391Aligned.sortedByCoord.out.bam"
## [146] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CGACTAGC.ATGTATCA_S407Aligned.sortedByCoord.out.bam"
## [147] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TCCTATCT.ATGTATCA_S415Aligned.sortedByCoord.out.bam"
## [148] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TGGTACAG.ATGTATCA_S431Aligned.sortedByCoord.out.bam"
## [149] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_GTTGACAG.ATGTATCA_S404Aligned.sortedByCoord.out.bam"
## [150] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TGTTGCAC.TCATATAT_S544Aligned.sortedByCoord.out.bam"
## [151] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CTTCTGGC.TCATATAT_S547Aligned.sortedByCoord.out.bam"
## [152] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CGACTAGC.TCATATAT_S503Aligned.sortedByCoord.out.bam"
## [153] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_GTACCAGC.ATGTATCA_S401Aligned.sortedByCoord.out.bam"
## [154] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CTAAGGCG.TCATATAT_S573Aligned.sortedByCoord.out.bam"
## [155] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CCATTGCG.ATGTATCA_S412Aligned.sortedByCoord.out.bam"
## [156] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ACGCTGCA.TCATATAT_S565Aligned.sortedByCoord.out.bam"
## [157] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CACAGTCT.TCATATAT_S526Aligned.sortedByCoord.out.bam"
## [158] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CGTCTGAA.ATGTATCA_S399Aligned.sortedByCoord.out.bam"
## [159] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CGAGCATT.ATGTATCA_S440Aligned.sortedByCoord.out.bam"
## [160] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CCTGTTAC.TCATATAT_S499Aligned.sortedByCoord.out.bam"
## [161] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CCACAATG.ATGTATCA_S418Aligned.sortedByCoord.out.bam"
## [162] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CAAGTGAC.TCATATAT_S539Aligned.sortedByCoord.out.bam"
## [163] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TTCCATTC.TCATATAT_S572Aligned.sortedByCoord.out.bam"
## [164] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_AAGTACCT.ATGTATCA_S406Aligned.sortedByCoord.out.bam"
## [165] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CCTTATGT.ATGTATCA_S389Aligned.sortedByCoord.out.bam"
## [166] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CAGAAGAA.ATGTATCA_S397Aligned.sortedByCoord.out.bam"
## [167] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CTTGTTGG.TCATATAT_S517Aligned.sortedByCoord.out.bam"
## [168] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CGTCTGAA.TCATATAT_S495Aligned.sortedByCoord.out.bam"
## [169] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_AACCAATC.ATGTATCA_S409Aligned.sortedByCoord.out.bam"
## [170] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TGGTGGAA.ATGTATCA_S394Aligned.sortedByCoord.out.bam"
## [171] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CCATACTC.TCATATAT_S550Aligned.sortedByCoord.out.bam"
## [172] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CGATGGCA.TCATATAT_S566Aligned.sortedByCoord.out.bam"
## [173] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ATTCTCCA.ATGTATCA_S449Aligned.sortedByCoord.out.bam"
## [174] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ACTTCAAC.ATGTATCA_S402Aligned.sortedByCoord.out.bam"
## [175] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CACATGGT.ATGTATCA_S416Aligned.sortedByCoord.out.bam"
## [176] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TCCTCAGA.TCATATAT_S564Aligned.sortedByCoord.out.bam"
## [177] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CAAGTAGT.ATGTATCA_S473Aligned.sortedByCoord.out.bam"
## [178] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CAGAAGAA.TCATATAT_S493Aligned.sortedByCoord.out.bam"
## [179] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CTTCTGGC.ATGTATCA_S451Aligned.sortedByCoord.out.bam"
## [180] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_GGTGTGAC.ATGTATCA_S417Aligned.sortedByCoord.out.bam"
## [181] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TGAACTCT.TCATATAT_S531Aligned.sortedByCoord.out.bam"
## [182] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TGGTGGAA.TCATATAT_S490Aligned.sortedByCoord.out.bam"
## [183] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CTCTCAGG.TCATATAT_S533Aligned.sortedByCoord.out.bam"
## [184] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ATTCGCAG.ATGTATCA_S426Aligned.sortedByCoord.out.bam"
## [185] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TGTAAGAC.ATGTATCA_S422Aligned.sortedByCoord.out.bam"
## [186] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_GTGTCCAT.ATGTATCA_S474Aligned.sortedByCoord.out.bam"
## [187] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CTAAGGCG.ATGTATCA_S477Aligned.sortedByCoord.out.bam"
## [188] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CAACTCCG.ATGTATCA_S464Aligned.sortedByCoord.out.bam"
## [189] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_GCCAGTGT.TCATATAT_S578Aligned.sortedByCoord.out.bam"
## [190] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ACCTGAGC.ATGTATCA_S427Aligned.sortedByCoord.out.bam"
## [191] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_ACCGACCA.ATGTATCA_S465Aligned.sortedByCoord.out.bam"
## [192] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CAACCGTG.ATGTATCA_S471Aligned.sortedByCoord.out.bam"
## [193] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_TCAGATAC.ATGTATCA_S419Aligned.sortedByCoord.out.bam"
## [194] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TCAACTGA.TCATATAT_S568Aligned.sortedByCoord.out.bam"
## [195] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_TGGTGACT.TCATATAT_S488Aligned.sortedByCoord.out.bam"
## [196] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_ATAGATCC.TCATATAT_S486Aligned.sortedByCoord.out.bam"
## [197] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_CCAGGTAA.TCATATAT_S525Aligned.sortedByCoord.out.bam"
## [198] "X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_CTGGTCGT.ATGTATCA_S423Aligned.sortedByCoord.out.bam"
The seqdata object also contains metadata information about the genes (one gene per row) in the wide-format:
The all.metadata file contains detailed information about our samples. We will need to use this to separate, annotate, and analyze our Plexwell samples by experiment later on.
metadataAs detailed by Phipson et al. (2020), we need to manipulate and reformat the seqdata raw counts table into a suitable format for downstream analysis. To do so, we will create a new counts matrix for each experiment. Using the first two columns in the seqdata dataframe, we can store the gene identifiers for a species (i.e. Geneid symbols) as the rownames and the sample ID information as the colnames . We will add additional annotation information about each gene later on as well.
Currently, the column names for the samples in the seqdata read counts table contain complex string identifiers. These are artefacts from the .fastq sequencing alignment and featureCounts post-alignment protocols.
X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_
X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_
Aligned.sortedByCoord.out.bam_[...]
Let’s parse these out for clarity before we create new count matrices. Tip: When data processing, it is recommended to limit the number of intermediary variables you generate in your Global Environment. For instance, use the magrittr pipe operator from dplyr (i.e. %>%) or the native R pipe operator (|> req. R 4.1+). This makes your code efficient and helps to prevent losing track of assigned variables.
# Clean up the counts table so we're only left with the
# unique ID barcodes as the sample colnames and the Geneid symbols as the rownames
modified.counts <- seqdata %>%
`rownames<-`(.[,1]) %>% select(-Geneid) %>% #Remove the Geneid column once you've set the rownames
select(starts_with('X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT')
& ends_with('Aligned.sortedByCoord.out.bam')) %>%
rename_with(~str_remove
(., 'X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.1_')) %>%
rename_with(~str_remove
(., 'X.nfs.colacino.Sequencing.ONES.Komen_Seqwell.6246.AT_nova.fastqs_6246.AT.6246.AT.2_')) %>%
rename_with(~str_remove(., 'Aligned.sortedByCoord.out.bam')) %>%
rename_with(~str_remove(., '_S.*')) #remove parts of column name after certain charactersTip: When data processing, it is also recommended to perform various ‘sanity checks’. That is, confirm your results are what you expect. For instance, use the head or tail function to take a quick glance at your data.
#Look at the output
modified.counts